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Collaboration networks are studied as an example of growing bipartite networks. These have been 
previously observed to exhibit structure such as positive correlations between nearest-neighbour 
degrees. However, a detailed understanding of the origin of such and the growth dynamics is 
lacking. Both of these issues are analyzed empirically and simulated using various models. A new 
growth model is presented, incorporating empirically necessary ingredients such as bipartiteness and 
sublinear preferential attachment. This, and a recently proposed model of team assembly both agree 
roughly with some empirical observations and fail in several others. 
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I. INTRODUCTION 

The study of networks has gained much attention in 
the physics literature recently 0, 0, The physics 
view on networks is to consider them using the tools of 
statistical mechanics. The availability of large databases 
has made it possible to do empirical studies of large net- 
works of different disciplines. A number of such networks 
have been identified and analyzed in the literature, the 
emphasis being mostly on the basic characteristics of the 
networks, such as the degree distribution, the clustering 
coefficient and the average shortest path length. How- 
ever, it has also been observed that the degrees of near- 
est neighbour nodes are not statistically independent but 
mutually correlated in practically every network imagin- 
able 0, IS 0, 111- In empirical observations, it is typ- 
ically found that technological and biological networks 
have negative correlations, also termed dissortative mix- 
ing, whereas social networks tend to have positive cor- 
relations J3 . The forming of triads (i.e . fully connected 
triplets) [H|, network bipartiteness [Tl| and a hierarchi- 
cal structure of social networks 01 have been suggested 
as reasons for the assortative mixing. It has also been 
found out that the presence of correlations might have 
consequences re gard ing the physics of dynamical models 
on networks |l3l Il4| . 

In this article, we take a close, empirical look at the 
degree-degree correlation structure of social collaboration 
networks. These networks are by force bipartite, in con- 
trast to many others. A bipartite graph is a graph with 
two kinds of vertices, say, A and T, in which there are 
only edges between two vertices of different kinds. The 
A nodes can be thought of as social actors or collab- 
orators and the T nodes as social ties or collaboration 
acts. Typical examples of these networks are the movie- 
actor network (the movies are the collaboration acts) and 
scientist-article networks where the scientists (the collab- 
orators) appear together as authors on the articles which 
play the role of collaboration acts. 

From a bipartite network, one can construct its unipar- 
tite counterpart, the so-called one-mode projection onto 
actors (ties), as a network consisting solely of the actors 
(ties) as nodes, two of which are connected by an edge for 



each social tie (actor) they both participate in (enlist as 
participants). For example, in the one-mode projection 
two scientists are connected to each other as many times 
as they have co-authored a paper (an alternative defini- 
tion not considered here would be to use this to define a 
weight for the link). 

Three important questions arise in this context. First, 
what is the structure of the bipartite network? The rel- 
evant quantities arc stated in the next section. Second, 
what can be stated in general of the one-mode projec- 
tion graph and its correlations? We consider this mostly 
via the average nearest- neighbour degree (ANND). Here, 
there is the main empirical observation that the ANND 
follows a power-law scaling, when considered as a func- 
tion of the degree of the central node. Moreover the de- 
gree distributions decay faster than scale- free ones and we 
discover sublinear effective preferential attachment (PA) 
rules, independent of time. 

Third, we consider two models. First, a growing bipar- 
tite network model is introduced such that it incorporates 
sublinear preferential attachment. We perform simula- 
tions of this model, and a team assembly model intro- 
duced recently by Guimera et al. [Lsl ] . Both models can 
reproduce roughly the one-mode actor degree distribu- 
tions and the latter also the power-law scaling of the actor 
ANND. However, the assembly model fails in matching 
the sublinear (empirical) PA rule, and in matching the 
clustering as such. 

This paper is organized as follows. Section 2 discusses 
the quantities measuring network topology. In Section 3, 
results of empirical measurements are presented. Section 
4 visits earlier models with similar goals. A new one is 
introduced in Section 5. In Section 6, the new model and 
the earlier ones arc compared to empirical measurements. 
Finally, Section 7 ends the paper with discussion and 
conclusions. 



II. NETWORK TOPOLOGY 

Let P(k) be the degree distribution in the one-mode 
projection onto actors, i.e. the probability that a ran- 
domly selected actor has k links. This quantity often 



2 



exhibits a fat tail that can be approximated with a 
power-law. The degree-degree correlations in the net- 
works are seen from the joint probability distribution 
P(k,k') where (2 — 5k,k')P(k,k') is the probability that 
a randomly selected edge connects nodes with degree k 
and k' . In undirected graphs (which are considered here), 
P(k, k') is necessarily symmetric with respect to k and 
k! . In uncorrelatcd networks, it takes the form 



P(k,k') = 



kk'P(k)P(k') 

w 



(1) 



The joint distribution P(fc, k') is often hard to mea- 
sure empirically due to a lack of a representative sam- 
ple, i.e. in real-life networks there are typically only a 
few edges connecting nodes with given degrees k and k! . 
Thus, another measures for the correlations have been 
devised, the most important of these being the average 
nearest- neighbour degree (ANND), which is the average 
degree of the nearest neighbours of nodes of degree k. 
Subsequently, it can be expressed as 
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E k ' k'p(k,k') 
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This quantity is less vulnerable to statistical fluctuations 
than P(fc, k') but naturally less informative. 

The dcgrcc-dcgrcc correlations can also be described 
by a Pearson correlation coefficient r between nearest- 
neighbour degrees. It is defined as 0, 
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If the network is uncorrelatcd, the ANND is a constant 
k n n{k) = (k 2 ) I (k) and the correlation coefficient van- 
ishes. A positive value of r and an increasing k nn {k) are 
signs of assortativc mixing. 

Another important quantity in networks is clustering 
or network transitivity, which is a measure of the ten- 
dency to find fully connected triangles in the graph. It 
can be measured from several different perspectives and 
the terminology in the literature varies between different 
sources. Here, notation and terminology adapted from 
Ref. ^3 is used and is as follows. 

Let m nn (x) be the number of links between the nearest 
neighbours of a given vertex x with degree k. The max- 
imum number of such connections is k(k — l)/2. Define 
the local clustering of node x as 



C x 



k(k- l)/2 



(4) 



Now, the global clustering characteristics of the net- 
work are the following. 

• The degree-dependent clustering. This is the av- 
erage of the local clustering of nodes with a given 
degree, i.e. 



where the subscript k emphasizes the fact that the 
average is taken only over nodes x with degree k. 

• The average clustering, which is the average of the 
local clustering over all nodes in the graph. It is 
defined in terms of the degree distribution P(k) 
and the degree-dependent clustering C(k) as 



C = Y J P{k)C{k). 



(6) 



• The clustering coefficient which is three times the 
ratio of the total number of loops of length three in 
the graph to the total number of connected triplets 
of vertices. It can also be defined in terms of P(k) 
and C(k) as 



j: k k(k-i)p(k)c(k) 
(k 2 ) - (k) 



(7) 



In networks without degree-degree correlations, the 
three clustering characteristics in Eqs. JHJ, © an d 
equal each other 



C(k) = C = c = 



((fc 2 ) ~ (k)) 2 
N(k) 3 



(8) 



C{k) = (C x ) k , 



(5) 



where N is the number of vertices in the network. 

In this work, the emphasis is on the ANND and the k- 
dependent clustering. These metrics probe degree-degree 
correlations and the density of closed loops of length 
three, respectively, which are considered important lo- 
cal characteristics of networks. These properties could 
also be measured by using the assortativity coefficient r 
and the clustering coefficient c. These differ, however, 
from those chosen to be emphasized here in an impor- 
tant respect; ANND and C(k) provide more detailed in- 
formation on the network structure than r and c, which 
are merely scalar quantities that can assume same val- 
ues for several different correlation or clustering profiles. 
Furthermore, the statistical quality of the empirical net- 
works appears to be high enough for these quantities to 
be reliably measured. 



III. EMPIRICAL RESULTS 

A. Analyzed networks 

In this work, the empirically analyzed data comes from 
two sources: from the Internet Movie Database (IMDB) 
|l8| (an older but preprocessed data set is also available 
at the web site lllj) and from the arXiv.org preprint 
server pcj . 

The actor-movie network from the IMDB is a bipar- 
tite network consisting of actors and movies (social ties) 
where an actor is linked to a movie if he acted in it. The 
network is rather comprehensive, containing around 770 
000 actors in about 430 000 films, the oldest one of which 
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network 


actor 


astro-ph 


cond-mat 


hep-ph 


N a 


766 386 


21 843 


28 526 


11 343 


N t 


427 969 


47 580 


49 330 


39 382 


(la) 


3.68 


10.4 


5.08 


7.89 


(it) 


6.59 


4.77 


2.94 


2.28 


(k) 


87.4 


57.2 


16.0 


23.4 


(kt) 


137.8 


118.3 


56.3 


70.4 


r 


0.292 


0.433 


0.250 


0.344 


c 


0.27 


0.578 


0.370 


0.441 


C 


0.817 


0.683 


0.674 


0.605 



TABLE I: The basic parameters of the networks analyzed em- 
pirically. The number of actors N a , the number of ties Nt, 
the average degree of actors and ties {(q a ) and (qt), respec- 
tively) and in the one-mode projection onto actors ((fe)) and 
ties {{kt}), the assortativity coefficient r (Eq. ©), the clus- 
tering coefficient c (Eq. 0), and the average clustering C 
(Eq. ©). 



dates back to 1890. The IMDB reports the on-screen 
credits as its primary data source. 

The arXiv.org preprint server hosts a collection of 
electronically available preprints in several disciplines of 
physics and related sciences. From such data, a bipar- 
tite graph of scientists (social actors) and articles (social 
ties) can be constructed. It is reasonable to assume that 
different disciplines are rather disconnected when author 
collaboration is considered. Thus it is natural to ana- 
lyze them separately. For this, three different disciplines, 
which contain most of the articles stored in the database, 
were chosen, namely astrophysics (astro-ph), condensed 
matter physics (cond-mat) and the phenomenology of 
high energy physics (hep-ph) . The number of articles in 
other disciplines is not large enough to permit a mean- 
ingful data analysis. The networks analyzed here contain 
articles up to the end of 2003. Note that though in the 
bipartite graph each edge is unique, multiple ties shared 
between the same pair of actors will produce multiple, 
degenerate links. 

Denote the degrees of actors and ties in the full bipar- 
tite representation by q a and q t , respectively, and their 
one-mode projected counterparts by k (for the actors) 
and kt (for the ties). The basic parameters of the four 
empirical networks under study can be found in Table [Q 
The values of the clustering coefficient, the average clus- 
tering and the assortativity coefficient differ from those 
in Refs. 0, 0, 0] , since newer versions of the data are 
used. The connectivity of the network was also studied, 
leading to the conclusion that all four networks consist of 
a giant component and a very small number of nodes out- 
side it; the second largest component in the condensed 
matter network is composed of 19 scientists, for instance. 
In other words, they are far from any kind of percolation 
transition whether in the bipartite form or in the one- 
mode projection. 
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FIG. 1: The actor degree distributions of the empirical data 
sets. All data appear to follow a power-law with exponent 
-1.6 for small k but there is a noticeable cutoff in each set. 
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FIG. 2: The tie degree distributions of the empirical data 
sets. The astrophysics collaboration network seems to ex- 
hibit a power-law distribution with the exponent being ap- 
proximately -2.7. All the other data sets have an exponential 
decay. The solid straight lines are guides to the eye. 



B. Degree distributions 

The degree distributions P(q a ) of the actors in the bi- 
partite graph form are plotted in Fig. ^ Logarithmic 
binning is used to reduce the effect of statistical fluc- 
tuations. All the four data sets can roughly be fitted 
by a power-law degree distribution P{q a ) ~ <Z<7 7a with 
7 a ~ 1.6 in the low-fc region, but there is a pronounced 
high degree cutoff. This is very similar in all the cases 
considered. 

The degree distributions of the ties P{qt) are depicted 
in Fig. |21 The movie-actor network data and the cond- 
mat and hep-ph scientist collaboration networks show 
an exponentially decaying degree distribution P{qt) ~ 
e -it/qo w j tn qQ ~ !3.4 5 2.0 and 1.0 for the actor, con- 
densed matter and high energy physics data sets, re- 
spectively. The astro-ph network is an exception since 
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FIG. 3: The degree distributions in the one-mode projection 
onto actors for the different empirical data sets as indicated 
by the legend. The scientist-article degree distributions are 
clearly not scale-free, but more reminiscent of the stretched 
exponential form (Eq. @) with a ~ 0.5 as indicated in the 
inset. The movie-actor network appears to have a power-law 
scaling regime around k = 100 but a more careful exami- 
nation shows that the tail behavior is also of the stretched 
exponential form, now with a « 0.4. The inset uses the same 
color coding for different data sets as the main figure. 
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FIG. 4: The fc-dependent clusterings of Eq. J5J for the empir- 
ical data sets as indicated by the legend. It is worth noting 
that for small degrees k the clustering is huge. The noisi- 
ness at high k (k > 100 for the scientist collaboration data, 
k > 1000 for the actor data) comes from the low statistical 
quality of the data in these regions (cf. the degree distribu- 
tions in Fig. [JJ . 



it clearly exhibits a power-law degree distribution for the 
ties (the inset of Fig.^J, i.e. P(q t ) ~ g t ~ 7t with 7 t w 2.7. 
This would seem to point to the direction that the col- 
laboration patterns in the astrophysics community differ 
essentially from those in other disciplines studied here. 
However, as seen below, most of the characteristic quan- 
tities of the networks are unaffected by such a different. 

The degree distributions in the one-mode projection 
onto actors are shown in Fig. [3] From the figure we 
see that the degree distribution of the actor-movie net- 
work has a lump in the lower-degree region, which is also 
somewhat visible in Fig. [2 and a short power-law region 
around k = 100. However, a careful look reveals that the 
tail behavior follows the stretched exponential form 

P(k) oc k~ a cxp{~-^—k 1 " a ) , (9) 
1 — a 

where [i depends on a and satisfies 1 < fj, < 2 j^, • 
For networks with this kind of degree distributions, the 
logarithm of the cumulative distribution function (shown 
in the inset of Fig. OJ) is a power-law of the degree k. 
The inset clearly shows that this is the case here, and 
that in the scientist collaboration networks a ~ 0.5 and 
in the actor network a ~ 0.4. The observations made 
here about the degree distributions are compatible with 
previous studies 0, |24| . 

C. Clustering and correlations 

The degree-dependent clusterings (naturally, in the 
one-mode projection) of Eq. J3J are plotted in Fig. 21 



from which we sec that the clustering is substantial (very 
close to one) for vertices with small degrees and gets lower 
with an increasing k. The low-fc behavior is expected 
since the actors with a small k arc likely to be connected 
only to collaborators sharing a single-tie, in which case 
the single-node clustering equals one. Furthermore, C(k) 
is also expected to be monotonically decreasing because 
the more collaborators a node has the less probable it 
will be for those to be connected with each other. 
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FIG. 5: The average nearest-neighbour degrees (ANND) 
k nn (k) as a function of the node degree k for the empiri- 
cal data sets. For each network, a power-law with exponent 
j3 ~ 0.3 can be roughly fitted to the data. This power-law 
behaviour is the most important empirical observation made 
here. Since the ANND is an increasing function of k, there 
is assortative mixing in the networks as typically in social 
networks. 
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The average nearest- neighbour degrees (k nn (k)) 
(Eq. (J2J) as a function of node degree k are plotted in 
Fig. El All networks behave similarly with respect to this 
quantity; a power-law scaling 



k n n(k) ~ kP 



(10) 



with (3 « 0.3 is observed in each one, with some small 
deviations. At high degrees k a cutoff, possibly a trace of 
the finite size of the networks, can be seen in each data 
set. 

Since the ANND is an increasing function of k, one can 
conclude that significant assortative mixing is present in 
the network, i.e. the degrees of adjacent nodes are posi- 
tively correlated. This can also be seen from the experi- 
mentally measured assortativity coefficient r in Table [I] 
The differences in the amplitudes of the ANND curves 
in Fig. El are explained by the differences in the average 
degree (k) of the networks. 



D. Preferential attachment 

From the empirical data, the (one-mode) preferen- 
tial attachment (PA) rule can also be measured quite 
straightforwardly given that the order of appearance (or 
preparation) of the articles or movies can be deduced 
from the available data, which is the case here. The 
measurement method has been devised by Newman |24j 
and goes as follows. Denote the degree-dependent pref- 
erential attachment rule, i.e. the bias to select actors of 
degree k, by T k . Given such a rule, the time-dependent 
probability that a node added to the network at time t 
connects to a node of degree k is given by 



Pk(t) 



T k n k (t - 1) 
N(t-1) 



(11) 



where n k (t — 1) is the number of nodes with degree k 
right before the addition of the new node and N(t — 1) is 
the total number of nodes in the graph at the same time. 
Given these quantities, the preferential attachment rule 
Tfc can be measured by making a histogram as a function 
of k to which a new link is added with the weight of 
N(t — l)/nfc(i — 1) each time one is created. 

If the attachment is non-preferential, T k is independent 
of k. On the other hand, with preferential attachment, 
Tfc is a growing function of the degree k and for instance 
in the Barabasi- Albert model [2f| T k cx k by definition. 

The empirically measured PA rules are plotted in 
Fig. [SI All of them are well fitted by power-laws T k ~ k a 
with high-fc-cutoffs. For the actor-movie network the 
measured value of the exponent a « 0.65, for the as- 
trophysics network a ks 0.6, and for the other networks 
a ~ 0.75. Different decades (movies) or years (articles) 
of accumulation of the data are shown separately to il- 
lustrate that the effective PA rule is independent of time. 
Note that the amplitudes of the plotted curves are irrel- 
evant, since T k is a relative probability. At low degrees 



(k < 10) the behaviour of the actor data set differs from 
the other ones. In essence, T k is approximately constant 
in this region. This means that for low degrees the ac- 
tual value becomes irrelevant and may perhaps indicate 
a sublinear version of Eq. I|12(l . Note also that we have 
observed different exponents a for different data sets but 
the same numerical value of the ANND exponent /?, ef- 
fectively ruling out a direct connection between ANND 
and Tfc. 

The position of the cutoff increases as a function of 
time, and thus as a function of the network size. A 
similar cutoff can be observed when measuring the PA 
rule retroactively for a network generated numerically, 
so we conclude that the cutoff is merely a finite-size ef- 
fect, which does not need to be taken into account ex- 
plicitly while building a simulational model. Similarly, 
in the team assembly model, the retroactively measured 
PA rule is a power-law with a cut-off, but with a ks 0.4 
and independent of the simulation parameters. We have 
not tried to consider the "TV' for the tie one-mode pro- 
jection, though it would naturally be of some interest. 

Measurements of the preferential attachment rule in 
the arXiv.org collaboration networks were also reported 
in Ref. [24[ by Newman. The measurement method used 
is the same as in this work. Surprisingly, Newman con- 
cludes that the preferential attachment is linear, which 
is in striking contrast to the results obtained here. Since 
the data sets and the measurement method are appar- 
ently the same, there remains only one possible expla- 
nation. Newman considered the arXiv.org network as a 
whole whereas in this work the division into disciplines 
is used. Also Barabasi et al. [2l| have measured the PA 
rule, but for different networks. They discover exponents 
0.75 and 0.8 for neuro-science and mathematics scientist- 
article collaboration networks, respectively. 



E. The one-mode projection onto ties 



The degree distribution and the average nearest 
neighbour-degree in the one mode projection onto so- 
cial ties are shown in Figs. [7| and |H1 respectively. The 
degree distributions in the scientist collaboration (arti- 
cle) networks are quite interesting. There is a practically 
constant region up to k ps 100 and a relatively rapid fall 
at larger degrees. On the other hand, the movie network 
appears to behave differently. There is an approximate 
power-law with slope around —0.6 for small k and no 
region of constant probability can be observed. 

Analogously to the projection onto actors, the average 
nearest-neighbour degree approximately scales as a power 
of the degree k also on this projection. The measured 
exponent is (3 t ~ 0.44. 
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FIG. 6: The measured effective preferential attachment (PA) rules for (a) the actor-movie network in the 1950s (o), 1960s (□), 
1970s ( ), 1980s (A), 1990s (<) and 2000s (V), and for (b) the astrophysics, (c) the condensed matter physics and for (d) the 
high energy physics scientist-article networks during years 1998 (o), 1999 (□), 2000 (0), 2001 (A), 2002 ( ]) and 2003 (V). The 
PA rules are well fitted by a power-law T k ~ k a with a cutoff and they appear to be independent of time. Numerically, we 
observe a w 0.65 for the actor network, a ss 0.6 for the astrophysics network and a ~ 0.75 for the other networks. The solid 
lines are guides to the eye. 



IV. PREVIOUS MODELS 

Earlier studies of collaboration networks are mostly 
centered around unipartite networks, with a few excep- 
tions [H E IH E3, El Growing unipartite net- 
works are relevant to the present work since they tell 
how the degree distribution depends on the growth rule. 
E.g. if uniform attachment would be used, the degree 
distribution becomes exponential 0] , in contrast to a lin- 
ear preferential attachment rule, with a power-law degree 
distribution |2f|. Between these two extremes lies sub- 
linear preferential attachment, i.e. a network growth rule 
that states that a new node connects to an existing one 
with probability proportional to k a (0 < a < 1), where 
k is the degree of the existing node. This kind of growth 
leads to a stretched exponential degree distribution of 
Eq. ©. 

Another family of preferential attachment rules is given 



by attachment probabilities life of the form 

n k xk + A, (12) 

where A is a parameter also termed additional attractive- 
ness 0. This leads to scale- free networks with a tunable, 
A-dependent, degree distribution exponent 7. The net- 
works develop degree correlations such that for A > 0, 
k n n{k) ~ log(fc) whereas for A < 0, a decaying power-law 
knn(k) ~ k^ with (3 < is recovered |3Cj . 

The reference model is the bipartite configuration 
model [Til . l3l| . In it, all actors and ties are created 
first, assigned degrees from given degree distributions and 
linked randomly such that the degrees are fulfilled. It has 
been proven mathematically that this model always 
leads to a non-negative assortativity coefficient. Despite 
of this, the correlations are clearly too weak to explain 
those in the empirical data fijj . 

Recently Ramasco et al. [lfj have introduced a model 
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FIG. 7: The degree distributions in the one-mode projection 
onto ties. The scientist-article networks have a region up to 
k a ~ 100 where the probability density is practically a con- 
stant, followed by a rapid decay. 
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FIG. 8: The average nearest-neighbour degrees in the one- 
mode projection onto ties. A common property of the scientist 
collaboration networks is an approximate power-law scaling. 



of a growing bipartite collaboration network, which is 
defined as follows. At each time step, a new tie with 
n actors is added to the network. Of these, m(< n) are 
new, i.e. are not currently a part or the network. The rest 
n — m arc chosen from the set of pre-existent actors with 
probability proportional to the number of ties q they have 
already participated in, i.e. by using linear preferential 
attachment. Ramasco et al. arrive at a scale- free degree 
distribution P(k) ~ k 1 in the one-mode projection onto 
actors with 



property of the model. In their model, the actors age such 
that the probability that an actor is alive (i.e. capable of 
participating in new ties) after having participated in q 
ties is 



i 

exp{- 



g-go 



} 



if q < q 
if q > q 



(14) 
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(13) 



There is a certain survival up to participation in qo ties 
and an exponential decay thereafter with a characteristic 
time t. 

Introducing the aging renders the model analytically 
unsolvablc and the degree distribution develops a cutoff 
at large values of the one- mode degree k. A similar cutoff 
is observed experimentally and Ramasco and co-workers 
use its position and steepness for determining the values 
of the aging parameters go an d t. With the aging one is 
able to make the assortativity coefficient (Eq. ©) pos- 
itive and increase the clustering coefficient (Eq. Q) so 
that both get closer to empirically observed values. 

Guimera et al. [l5j have introduced a model of team 
assembly mechanisms that is quite similar to the one dis- 
cussed above. In their model, new teams are formed, 
and their members are selected according to the following 
rules. At each time step a new team with m members is 
created. For each member an incumbent, that is a mem- 
ber that is already part of the network, is chosen with 
probability p, and a new one is created with probability 
1— p. If the previous member was an incumbent, the next 
one is selected from its collaborators with probability q, 
otherwise the selection is performed as above. 

Identifying the teams with the social ties and the team 
members as the social actors, the team assembly model 
is a version of a growing bipartite network model. In 
it, the preferential attachment rule consists of two ingre- 
dients: The first incumbent member of each team and 
the subsequent ones with probability 1 — q are selected 
with random attachment, whereas the rest are chosen us- 
ing linear preferential attachment, which comes here into 
play implicitly since a random previous collaborator of a 
member is chosen |lOj . Guimera et al. have found that 
the degree distribution of the one-mode projection of the 
resulting graph mimicks empirically measured distribu- 
tions reasonably well [l^ . 

Somewhat similar studies have also been conducted by 
Borrrer et al. [H, Goldstein et al. gl and Morris H|. A 
common goal of these is to introduce realistic models of 
collaboration networks. However, they do not pay any at- 
tention to degree-degree correlations. Similar ideas have 
also been applied to the bipartite network of research 
projects funded by the European Union and organiza- 
tions participating in them [29j- 



While the model above can be solved analytically, it 
fails to explain some features when compared to empiri- 
cal data |lfl | . The most important deviations are in clus- 
tering and correlations. To overcome this, Ramasco and 
co-workers introduced the aging of actors as an additional 



V. SUBLINEAR MODEL 

Motivated by the empirical observations of the previ- 
ous section, the following model of a growing bipartite 
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collaboration network is proposed. It is also related to 
that of Ramasco et al. [l(| (see also Sec. I1V|) . but it is 
significantly modified in order for it to be consistent with 
empirical facts. 

The model is as follows (the addition of a new tie is 
illustrated in Fig. 

• At each time step, a new tie is added to the net- 
work. The number of actors n of this tie is a ran- 
dom variable whose distribution is given as an input 
parameter. Since this distribution can be measured 
experimentally, and the measured functional form 
is to be used, there is no fitting involved. 

• Of these n actors each one has a given probability 
p to be a new one, i.e. at a fixed n the number of 
new actors is a random variable with binomial dis- 
tribution. The probability p can be estimated from 
the data by selecting it such that, on the average, 
the total number of actors in the end of the sim- 
ulation equals that in the empirical network being 
mimicked. 

• The first one of the rest n — m of them, with m 
being the number of new actors, is chosen from the 
pool of all pre-existent actors such that actor i gets 
chosen with probability 



i.e. with sublinear preferential attachment. The 
corresponding rule of the model of Ramasco et al. 
has a = 1. 

• Each one of the rest n — m — 1 actors is chosen from 
the set of the earlier collaborators of the previously 
chosen actor with probability ptf also with sub- 
linear preferential attachment, and as described in 
the previous point with probability 1 — ptf- 

The most important new ingredient, the sublinear 
form of the PA rule, is justified by the measurements 
in Sec. IIII Dl Another quantum of motivation comes 
from the fact that degree distributions which are not pure 
power-laws have been measured (see Fig.|3]m Sec. HUB)) 
and such degree distributions have been demonstrated 
to be caused by sublinear PA. Still further motivation is 
given by the studies of Onody and de Castro [3^ where 
sublinear PA was found to lead to positive degree-degree 
correlations in terms of the assortativity coefficient r. 

The motivation behind the triad formation (TF) pro- 
cess described in the last rule is the fact that several mod- 
els incorporating this kind of behavior have been found 
to lead to increased clustering, closer to what has been 
observed in real networks [10j . A similar ingredient is 
also present in the team assembly model ^f|, where the 
parameter q plays the role of the triad formation proba- 
bility. 



network 


empirical 


sublinear 


sublinear 


team assembly 






Ptf = 0.0 


Ptf = 0.9 




N a 


28 526 


25 497 


24 650 


24 477 


N t 


49 330 


49 000 


49 000 


49 000 


(la) 


5.08 


6.65 


6.92 


6.95 


(it) 


2.94 


3.46 


3.48 


3.47 


(k) 


16.0 


13.8 


14.1 


13.9 


(kt) 


56.3 


36.1 


107.6 


33.2 


r 


0.250 


0.13 


0.15 


0.26 


c 


0.370 


0.10 


0.18 


0.32 


C 


0.674 


0.51 


0.66 


0.65 



TABLE II: The basic parameters of the simulated networks 
compared with the empirical condensed matter collaboration 
network. 



VI. NUMERICAL RESULTS 

In the simulations reported in this section, the number 
of ties in a simulated network is always Nt = 49000, and 
the fraction of new social actors in a given tie is p ne w = 
0.202. The same parameters are also used for simula- 
tions of the team assembly model, i.e. p = 1 — pnew, and 
the simulation is run until N t teams are created. This 
selection of the parameters comes directly from the em- 
pirical measures of the condensed matter collaboration 
network. In the simulations of the team assembly model, 
the probability to select a previous collaborator of an in- 
cumbent is q = ptf = 0.9 unless otherwise mentioned. 
This choice leads to the correct order of magnitude in the 
clustering of the resulting graph, as will be seen below. 
In both models, the number n of actors in a tie a drawn 
from the same probability distribution that corresponds 
to the empirically measured one (see Fig- El ■ I n addition, 
the aging mechanism is omitted in both models, i.e. the 
parameter r of the team assembly model and the param- 
eter qo of the model of Ramasco et al. are set to infinity. 
This appears to be justified since the ubiquitous char- 
acteristics of the ANND and the fc-dependent clustering 
C(k) do not show experimental dependence on the net- 
work age. Indeed, these are similar for both the movie 
network (in which aging surely has taken place) and the 
physics collaboration networks, in which the data collec- 
tion is for a short interval and aging plays only a little 
role if at all. All the simulation results are from a single 
simulation run, i.e. from one network. To check the va- 
lidity of this approach, we ran several simulations with 
the same parameters. The simulations are practically in- 
distinguishable from each other, and thus we conclude 
that this is justified. 

The basic parameters of the simulated networks, com- 
pared with the empirical condensed matter data, are 
shown in Table [P] The parameters that are either in- 
put to the models or straightforwardly depend on those, 
such as the number of nodes of different kinds, the aver- 
age degrees in the bipartite network and in the one-mode 
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FIG. 9: An illustration of the one-mode projection and the addition of a new tie in the model with sublinear preferential 
attachment. Above the filled circles denote social ties, the open ones social actors and the lines the links between them. Below, 
each social actor is drawn again, now with links between them in the one-mode projection visible, i.e. two actors are connected 
if they participate in the same tie. (a) The network before the addition. For example, the leftmost actor has the bipartite 
degree q a = 2 and the one-mode projected degree k = 4. Similarly, the leftmost tie has q t = 3 and kt = 2. (b) The network 
after the addition. The corresponding one-mode projections onto actors (ties) are drawn below (above) the bipartite networks. 
The new tie (the rightmost filled circle) introduced one new actor (the grey circle) who acquired two links to pre-existing actors 
(shown by blue lines). The new tie also caused a new connection between two pre-existing actors (shown by red line). The new 
tie connected to both existing ties; once to the leftmost one (one common actor) and twice to the middle one (two common 
actors). The same growth schematics also apply to the model of Ramasco et al. and to the team assembly model ITe1| . 



projection onto actors, are, naturally, reproduced quite 
well. On the other hand, already the average degree (kt) 
in the one-mode projection onto ties shows discrepancies 
between the simulations and the data. It appears that 
Ptf could be used in the sublinear model to tune this 
value, but in this work the role of the tuning target is 
played by the average clustering C. Regarding the clus- 
tering and correlations, the best numerical fit is given by 
the team assembly model. 

The degree distributions in the one-mode projection 
onto actors are plotted in Fig. for the empirically 
measured condensed matter collaboration network and 
for simulations of the sublinear model with different val- 
ues of a together with a simulation of the team assembly 
model. It is seen that the simulation of the sublinear 
model (with or without triad formation) with a equal to 
the experimentally measured effective one (Fig. [5} mim- 
icks the empirical degree distribution significantly better 
than the one using the linear PA rule. The latter leads 
to a scale-free degree distribution as predicted 0] . Also 
the team assembly model is capable of reproducing the 
empirical degree distribution reasonably well. In the rest 
of this paper, the value a = 0.75 is used unless otherwise 
mentioned. Note that the comparison of the other net- 
works studied empirically in this work to corresponding 
simulation yields the same behavior. 

The degree-dependent clustering C(k) of Eq. JSJ in 



the one-mode projection is plotted in Fig. ^2 for the 
condensed matter collaboration network and for simula- 
tions both of the sublinear model and the team assembly 
model. From the figure, it can be seen that the sublinear 
model without triad formation differs notably from the 
empirical data whereas the sublinear model with a high 
probability for triad formation and the team assembly 
model give a correct order of magnitude for the overall 
clustering (see also Table |nj but the form of the C(k) 
curve differs from the empirical one. In this respect, the 
sublinear model does slightly better that the team as- 
sembly model. 

The average nearest- neighbour degree k nn {k) (ANND) 
is plotted in Fig. ^] for the condensed matter empirical 
data set and for simulations of both models. The fig- 
ure shows that the team assembly model reproduces the 
correlation structure of the empirical network reasonably 
well in the intcrmcdiatc-fc regime: both appear to roughly 
scale as k nn (k) ~ k^ with (3 = 0.3 and approximately 
agree in the amplitude. However, the simulation differs 
from the data at both low and high fc-values. On the 
other hand, the simulations of the sublinear model show 
a similar scaling but with a different, smaller exponent 
09 w 0.15). 

To study the effect of the exponent a on the scaling 
of the ANND in the sublinear model, it is depicted in 
Fig. E| as a function of a. For a = 1, we see that there 
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FIG. 10: The degree distribution in the one-mode projection 
onto actors in the condensed matter collaboration network 
and in simulations of the sublinear model with different values 
of the PA exponent a and of the team assembly model with 
Ptf = 1.0. The curve for a = 1.0 leads to power-law degree 
distribution as expected Urt . whereas that with a = 0.75 and 
the one of the team assembly model is clearly a lot closer to 
the empirical values. In the sublinear model, the behavior is 
the same even for ptf=0 (not shown). 
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FIG. 11: The fc-dependent clustering (Eq.JjjJ) compared to 
empirical measurements for both the sublinear model (a — 
0.75) and for the team assembly model. It is seen that the 
sublinear model at ptf =0 cannot reproduce the clustering 
as expected whereas both the team assembly model and the 
sublinear model at ptf = 0.9 do somewhat better. However, 
not one of the models agrees fully with the empirical data. 



are no degree-degree correlations at all, corresponding to 
the model of Ramasco and co-workers without aging. For 
lower values of a, positive correlations are present and 
the ANND scales as a power-law of the vertex degree 
k as above. The value of (3 depends continously on a 
as seen in the inset of Fig. ED However, the numerical 
value of the exponent (3 is notably lower in the relevant 
region 0.6 < a < 0.8 than the experimentally observed 
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FIG. 12: The average nearest-neighbour degree (ANND) in 
simulations of the sublinear model (a = 0.75) and the team 
assembly model compared to the empirical measurements. 
The results from the team assembly model are in reasonable 
agreement with the real data, whereas those from the sub- 
linear one roughly scale as a power of k but with a different 
exponent. The solid and dashed lines are guides to the eye. 
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FIG. 13: The average nearest-neighbour degree (ANND) in 
simulations using different values of the preferential attach- 
ment exponent a. For all a the ANND scales as a power of 
the node degree k, k nn (k) ~ k 13 ^ so that for a = 1, p = 0, 
i.e. no degree-degree correlations are observed, and for a < 1, 
j3 grows as a decreases leading to positive correlations. The 
inset shows the dependence of /3 on a. 



one. Thus, the overall correlations in this case are not 
as strong as in the empirical data. This conclusion can 
also arrived at considering the values of the assortativity 
coefficient r in Table ITT1 

To see how to the triad formation affects the scaling 
of the ANND, it is plotted for several values of ptf in 
Fig. E| for the sublinear model. It is clearly seen from 
the figures that the triad formation process has no effect 
at all. We have also performed a corresponding series of 
simulations with several different values of a. In all cases, 
the conclusion remains the same. Simulations of the team 
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FIG. 14: The scaling of the ANND for different values of 
the triad formation (TF) probability in simulations of the 
sublinear model. It is seen that introducing a TF mechanism 
does not change the scaling. In addition to the results in the 
figure (a = 0.7) we have checked this with several other values 
of q, too, with the same results. The same also applies to the 
team assembly model (results not shown). 
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FIG. 15: The retroactively measured preferential attachment 
rule for a simulation of the team assembly model. The rule is 
approximately a time-independent sublinear power-law with 
a w 0.45 and with a cut-off. The solid line is a guide to the 
eye. 
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FIG. 16: The degree distribution in the one-mode projection 
onto ties in the empirical condensed matter data set, com- 
pared to simulations. A considerable difference exists between 
the simulations and the data. 



<x> empirical (cond-mat) 
b-h sublinear 
1000 t a-a team assembly 
: g-o sublinear with p, 




10000 



FIG. 17: The average nearest-neighbour degree in the one- 
mode projection onto ties in the empirical condensed matter 
data set and in the simulations. These are unable to repro- 
duce the empirically observed power-law scaling. Quantita- 
tively, the team assembly model mimicks the behavior of the 
data better than the sublinear model. 



assembly model also revealed the same behaviour. Thus, 
we conclude that this kind of process can not be held 
responsible for the observed correlations. Again, com- 
paring the assortativity coefficient r in Tabic ITT| for the 
sublinear model with and without triad formation sup- 
ports this conclusion. 

The retroactively measured preferential attachment 
rule for the team assembly model is plotted in Fig.^| It 
is clearly seen that the rule is, again, a sublinear power- 
law with a w 0.4 and with a cut-off that is very similar 
to the empirical ones (cf. Fig. [BJ and to those in the 
simulations of the sublinear model (not shown). This is 
surprising since, at the first sight, one could anticipate 
that the combination of attachment rules in the team 



assembly model leads to a compound rule of the form 
k + A. However, the bipartiteness comes into play, and 
this phenomenon shows that it can indeed affect essen- 
tially the network structure. It can also be seen that the 
effective attachment rule is time-independent, as is true 
concerning the empirical data. 

Next we compare the models with the one-mode pro- 
jection onto social ties instead of social actors. These 
are shown in Figs. 1161 and 1171 for the degree distribution 
and the average nearest-neighbour degree, respectively. 
From Fig. ^] one can see that the characteristic plateau 
of the scientist collaboration networks is not captured by 
any of the simulations. Similar conclusions can be made 
from Fig. El n °t 0110 °f the models lead to the power- 



12 



law scaling that we have empirically observed. There are 
also considerable differences in the overall magnitudes of 
the quantities, in which respect the team assembly model 
behaves best. 



VII. SUMMARY AND DISCUSSION 

In this paper, we have analyzed several bipartite col- 
laboration networks empirically. The static and dynamic 
structure thereof is one of the conceptually simplest ex- 
amples of complex networks, where effective statistical 
laws seem to exist. The quantitative description of such 
phenomena by models becomes then the (elusive) goal. 
These systems are very clear-cut in that the old graph 
structure is static - old vertices and edges are not re- 
moved - and that the growth events are easy to quantify 
by various measures and to follow, from data. 

Concerning the correlation structure of collaboration 
networks, the most important empirical observation is 
that the average nearest-neighbour degree (ANND) in 
the one-mode projection onto social actors scales as a 
power of the node degree as k nn {k) ~ k 13 with j3 « 0.3. 
Similar scaling is also present in the projection onto so- 
cial ties, i.e. articles or movies. The clustering of the 
one-mode network(s) is considerable. The effective actor- 
projection preferential attachment (PA) rule appears to 
be a sublinear power-law, and independent of time. 

We have also introduced a model, which is built on top 
of this observation, in an attempt to explain the form of 
the observed properties of the networks. The empirically 
observed sublincarity of the PA rule has thus been in- 
cluded in a numerical model. In this case, the ANND 
indeed scales as a power of k, but the numerical values 
of the exponents do not match. In any case, the model 
is capable of demonstrating that the form of the PA rule 
can essentially affect the correlation structure. 

Another model of team assembly mechanisms |15) has 
also been simulated. The ANND seems to fit the (ac- 
tor) empirical observations reasonably well: both roughly 
scale as k@ with ^ « 0.3. A common feature of both 
models is that they reproduce the degree distribution in 
the one-mode projection rather well (see Fig. llU|) . In the 
case of the sublinear model, using the correct, empirically 
measured, value of the exponent a is necessary for this 
result. On the other hand, team assembly model fails to 
reproduce the cmpiricl attachment rule.. The sublinear 
model without any triad formation fails to reproduce the 
form of the fc-dependent clustering, whereas the team as- 
sembly model and the sublinear model with considerable 
probability for triad formation lead to correct order of 
magnitude of the average clustering C, seen also in the 
overall magnitude of C(k). However, the models do not 
explain its functional form. Note that a triad formation 
process does not change the correlations in the models 



studied here. 

Considering the one-mode projection onto social ties 
instead of actors reveals the inadequacy of the both mod- 
els. The empirically measured degree distribution and 
the average nearest-neighbour degree both differ from 
their simulated counterparts. In effect, we have observed 
that even though various can reproduce some proper- 
ties of the projections onto actors, they lack explana- 
tory power when it comes to considering the networks 
with their full bipartite structure intact. Perhaps one 
should consider tie-based growth rules instead of actor- 
based ones. 

Summarizing, there is a clear need for a more complex 
bipartite growth model that accounts for both the clus- 
tering and correlations of actors (authors) and ties (ar- 
ticles). Since the bipartite structure changes by events 
in which one tie is introduced together with several ac- 
tors, this means that the old actors' effective choice must 
follow from a rule that measures the correlation struc- 
ture in more detail. One candidate would be to use k- 
conncctcd cliques in analogy to recent observations of the 
role of such in network superstructure [33l |. This would 
allow for various ways of measuring the joint strength of 
interaction between old actors. Furthermore, using the 
recently introduced concept of social inertia [34| might 
be of use in this respect, by establishing a quantitative 
time-dependent measure. It is also clear that there are 
substructures within subficlds. These point towards the 
idea that the actors and ties have "hidden variables" that 
should be taken into account. One practical prospect 
would be to use e.g. the PACS indices to classify ties 
(articles) and actors/ authors, and investigate the role of 
the both above ideas. Note that in all the cases here the 
"invisible college" or giant component of the one-mode 
projection onto actors includes really almost all of the 
actors and is thus trivial. It is an open question how to 
define and measure the "success" of an actor given this, 
and the performance of current models - simple member- 
ship is not enough. Again, possibly progress could be 
made by the use of weighted networks. 

Even though several sources of positive degree-degree 
correlations have been demonstrated here, there are still 
open questions related to these. Most importantly, the 
reason or origin of the specific form of the correlations 
remains unknown. Perhaps one needs to define more in- 
formative quantities for measuring the structure of the 
original bipartite network. Studies on how the form of 
the (one-mode) PA rule depends on the underlying ele- 
mentary social phenomena offer interesting avenues for 
future work. 
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